Problem Statement:¶
In the food industry, identifying food items quickly and accurately is essential for applications such as automated inventory management, calorie estimation, restaurant automation, and dietary monitoring. Manual identification is time-consuming, error-prone, and not scalable. Thus, there is a need for an automated, intelligent system that can classify food items from images with high accuracy.
Context:¶
In the era of digital transformation, automated food detection using computer vision has become increasingly important in various sectors such as hospitality, healthcare, fitness, retail, and food delivery. Accurate identification of food items from images enables intelligent systems to recognize what a person is eating, streamline restaurant operations, or even automate checkout processes in cafeterias.
For example, in a smart cafeteria, cameras can detect and identify food items on a tray without manual input, enabling a frictionless billing experience. In diet and nutrition apps, users can take a picture of their meal, and the app can instantly classify the food and estimate nutritional content. In quality assurance for food production, automated systems can detect if the right type of food is being processed or if items are visually defective.
Such applications demand a robust food classification model capable of identifying food items from images with high accuracy, regardless of variations in presentation, lighting, or camera angles. This project aims to tackle this challenge by leveraging deep learning techniques to train a model that can automatically detect and classify different types of food from a diverse dataset of labeled food images.
Data Descriptions:¶
The project uses a curated subset of the Food-101 dataset, a widely used benchmark for food classification tasks. This dataset includes:
500 images categorized into
10 distinct food classes (e.g., apple_pie, fried_rice, sushi)
Each class contains a balanced distribution of training and test images, generally split in a 70-30 ratio
Images vary in lighting, background, and angle to mimic real-world food photography conditions
Each image is labeled with the corresponding food class, enabling supervised learning approaches to be applied effectively.
Project Objective¶
The primary goal of this project is to:
Develop a deep learning-based food identification model that can accurately classify food items from images.
Key objectives include:
Building a convolutional neural network (CNN) model to classify food images into one of the 10 defined categories
Evaluating model performance using standard metrics such as accuracy, precision, recall and confusion matrix.
Enabling a potential real-time application where the trained model can be integrated into camera-based systems for smart kitchens, restaurant automation, or diet-tracking apps
Ultimately, this solution aims to demonstrate the feasibility of intelligent, camera-driven food recognition systems, contributing toward innovations in food technology and AI-driven lifestyle tools.
Step 1: Import the data¶
Importing Required Libraries¶
import os # File and directory operations
import pandas as pd # Data handling
import matplotlib.pyplot as plt # Plotting
import matplotlib.patches as patches # Drawing shapes on plots
import cv2 # Image processing
import numpy as np
Unzipping the Food-101 Dataset¶
# Define the path to the ZIP file containing the dataset
zip_path = 'Food_101.zip'
# Define the directory where the ZIP file should be extracted
extract_to = 'food101_data'
import zipfile # Importing the zipfile module to handle ZIP archives
# Open the ZIP file in read mode ('r') using a context manager
with zipfile.ZipFile(zip_path, 'r') as zip_ref:
# Extract all contents of the ZIP file to the specified directory
zip_ref.extractall(extract_to)
# Print confirmation message after extraction is complete
print("Dataset unzipped!")
Dataset unzipped!
Exploratory Data Analysis¶
Verify Directory Structure¶
# List all files and directories in the specified path 'extract_to'
# 'extract_to' should be a variable that holds the path where your dataset was extracted
os.listdir(extract_to)
['.DS_Store', '__MACOSX', 'Food_101']
List classes¶
# Join the extraction directory with the 'Food_101' folder to get the full path
food101_dir = os.path.join(extract_to, 'Food_101')
# List all files and subdirectories in the 'Food_101' folder
# This will typically include folders like 'images' and files like 'meta'
os.listdir(food101_dir)
['ice_cream', 'samosa', 'donuts', '.DS_Store', 'waffles', 'falafel', 'ravioli', 'strawberry_shortcake', 'spring_rolls', 'hot_dog', 'apple_pie', 'chocolate_cake', 'tacos', 'pancakes', 'pizza', 'nachos', 'french_fries', 'onion_rings']
base_path = 'food101_data/Food_101/' # path to class folders
class_to_images = {}
for cls_name in os.listdir(base_path):
cls_folder = os.path.join(base_path, cls_name)
if os.path.isdir(cls_folder):
image_files = os.listdir(cls_folder)
class_to_images[cls_name] = image_files
# Summary
total_images = sum(len(v) for v in class_to_images.values())
print(f"Total classes: {len(class_to_images)}")
print(f"Total images: {total_images}")
Total classes: 17 Total images: 16257
for i, (cls, imgs) in enumerate(class_to_images.items()):
print(f"{cls}: {len(imgs)} images")
ice_cream: 1000 images samosa: 1000 images donuts: 1000 images waffles: 1000 images falafel: 1000 images ravioli: 1000 images strawberry_shortcake: 1000 images spring_rolls: 1000 images hot_dog: 1000 images apple_pie: 257 images chocolate_cake: 1000 images tacos: 1000 images pancakes: 1000 images pizza: 1000 images nachos: 1000 images french_fries: 1000 images onion_rings: 1000 images
Observation :
- Total Classes: There are 17 different food categories in your current dataset.
- Total Images: There are a total of 16,257 food images available.
- Uniformity: Most classes (like pizza, donuts, pancakes, etc.) have 1,000 images each, showing good class balance.
- Exception: Only one class, apple_pie, has fewer images (257 only) — this may cause imbalance in training.
- This dataset is suitable for multi-class image classification, and can also be extended to object detection if bounding boxes are added.
Class Distribution Plot¶
# 1. Class Distribution Plot
classes = list(class_to_images.keys())
counts = [len(imgs) for imgs in class_to_images.values()]
plt.figure(figsize=(12, 6))
plt.bar(classes, counts, color='skyblue')
plt.xticks(rotation=45, ha='right')
plt.xlabel('Food Classes')
plt.ylabel('Number of Images')
plt.title('Number of Images per Food Class')
plt.show()
Observation:
- Most classes contain exactly 1,000 images, which is ideal for training.
- Only one class (
apple_pie) has significantly fewer images (257) — this may lead to class imbalance during training. - Dataset is well-suited for image classification tasks.
Image Size Analysis (width and height)¶
# Image Size Analysis (width and height)
import random
from PIL import Image
widths, heights = [], []
for cls, images in class_to_images.items():
sample_images = random.sample(images, min(20, len(images))) # sample 20 images per class
for img_name in sample_images:
img_path = os.path.join(base_path, cls, img_name)
with Image.open(img_path) as img:
w, h = img.size
widths.append(w)
heights.append(h)
plt.figure(figsize=(12, 5))
plt.subplot(1, 2, 1)
plt.hist(widths, bins=30, color='salmon', edgecolor='black')
plt.title('Distribution of Image Widths')
plt.xlabel('Width (pixels)')
plt.ylabel('Count')
plt.subplot(1, 2, 2)
plt.hist(heights, bins=30, color='lightgreen', edgecolor='black')
plt.title('Distribution of Image Heights')
plt.xlabel('Height (pixels)')
plt.ylabel('Count')
plt.tight_layout()
plt.show()
Image Size Distribution Observation
Most images are 512x512 pixels.
- This indicates the dataset is already quite standardized.
A few images have smaller dimensions (e.g., 300, 350 pixels).
- These are outliers and occur rarely.
This consistency is useful for model training.
- We can resize all images to 512x512 or a smaller fixed size (like 224x224) for deep learning models.
No very large or very small images were found.
- This ensures minimal image distortion during preprocessing.
Visualize the data, showing one image per class from 101 classes¶
# Visualize the data, showing one image per class from 101 classes
# Path to dataset
data_dir = food101_dir # Assuming `food101_dir` is already defined
foods_sorted = sorted([
d for d in os.listdir(data_dir)
if os.path.isdir(os.path.join(data_dir, d))
])
# Total number of classes
num_classes = len(foods_sorted)
# Dynamically define grid size
cols = 6
rows = int(np.ceil(num_classes / cols))
# Create subplots
fig, ax = plt.subplots(rows, cols, figsize=(4 * cols, 4 * rows))
fig.suptitle("Showing one random image from each class", y=1.02, fontsize=24)
# Flatten axes for easier iteration (in case rows * cols > num_classes)
ax = ax.flatten()
for food_id, food_name in enumerate(foods_sorted):
food_images = os.listdir(os.path.join(data_dir, food_name))
random_img = np.random.choice(food_images)
img_path = os.path.join(data_dir, food_name, random_img)
img = plt.imread(img_path)
ax[food_id].imshow(img)
ax[food_id].set_title(food_name, pad=10)
ax[food_id].axis('off')
# Hide any extra axes if there are unused subplots
for i in range(num_classes, len(ax)):
ax[i].axis('off')
plt.tight_layout()
plt.subplots_adjust(top=0.93) # Leave room for suptitle
plt.show()
Step 2: Map training and testing images to its classes.¶
from sklearn.model_selection import train_test_split
# Adjust path as needed
base_path = 'food101_data/Food_101'
# Get class names from folder names
class_names = sorted([folder for folder in os.listdir(base_path) if os.path.isdir(os.path.join(base_path, folder))])
food_data = []
# Collect image path and class label
for label in class_names:
folder_path = os.path.join(base_path, label)
for img_file in os.listdir(folder_path):
if img_file.lower().endswith(('.jpg', '.jpeg', '.png')):
img_path = os.path.join(folder_path, img_file)
food_data.append((img_path, label))
# Create DataFrame
food_df = pd.DataFrame(food_data, columns=['image_path', 'label'])
# Split into train/test (80/20)
train_food_df, test_food_df = train_test_split(food_df, test_size=0.2, stratify=food_df['label'], random_state=42)
print("✅ Mapped images to classes.")
print(f"Train: {len(train_food_df)} images, Test: {len(test_food_df)} images")
train_food_df.head()
✅ Mapped images to classes. Train: 13004 images, Test: 3252 images
| image_path | label | |
|---|---|---|
| 2230 | food101_data/Food_101/donuts/2249805.jpg | donuts |
| 12195 | food101_data/Food_101/samosa/1145678.jpg | samosa |
| 13392 | food101_data/Food_101/strawberry_shortcake/225... | strawberry_shortcake |
| 13828 | food101_data/Food_101/strawberry_shortcake/354... | strawberry_shortcake |
| 10269 | food101_data/Food_101/ravioli/788592.jpg | ravioli |
food_df
| image_path | label | |
|---|---|---|
| 0 | food101_data/Food_101/apple_pie/2968812.jpg | apple_pie |
| 1 | food101_data/Food_101/apple_pie/3134347.jpg | apple_pie |
| 2 | food101_data/Food_101/apple_pie/3314985.jpg | apple_pie |
| 3 | food101_data/Food_101/apple_pie/3670548.jpg | apple_pie |
| 4 | food101_data/Food_101/apple_pie/3917257.jpg | apple_pie |
| ... | ... | ... |
| 16251 | food101_data/Food_101/waffles/764669.jpg | waffles |
| 16252 | food101_data/Food_101/waffles/113651.jpg | waffles |
| 16253 | food101_data/Food_101/waffles/2364175.jpg | waffles |
| 16254 | food101_data/Food_101/waffles/3844038.jpg | waffles |
| 16255 | food101_data/Food_101/waffles/1576252.jpg | waffles |
16256 rows × 2 columns
Step 3: Create annotations for training and testing images.¶
[Take any 10 foods(class) of your choice and select any 50 images inside each food and create the annotations manually. You can use any image annotation tool to get the coordinates.]
Image Annotation Overview:
To train a model for object detection (such as YOLO, SSD, or Faster R-CNN), we’ve created annotations for selected food classes. These annotations are saved in a CSV file and follow a structured format suitable for model training.
Annotation Task Details
We selected 10 food classes of our choice. wihch is
- French Fries
- Apple Pie
- Nachos
- Pizza
- Pancakes
- Tacos
- Chocolate Cake
- Hot Dog
- Onion Rings
- Spring Roll
For each food class, we manually annotated 50-60 images.
We used an image annotation tool(Roboflow) to mark bounding boxes (object locations).
The annotation data is saved in a file: Datasetv1/original_images/_annotations.csv
Annotation File Structure:
The CSV file contains the following columns:
| Column | Description |
|---|---|
filename |
Name of the image file (e.g., pizza_01.jpg) |
width |
Width of the image in pixels |
height |
Height of the image in pixels |
class |
Name of the object class (e.g., pizza, samosa, etc.) |
xmin |
X-coordinate of the top-left corner of the bounding box |
ymin |
Y-coordinate of the top-left corner of the bounding box |
xmax |
X-coordinate of the bottom-right corner of the bounding box |
ymax |
Y-coordinate of the bottom-right corner of the bounding box |
This format is commonly used in object detection datasets to describe the position and size of objects within each image.
File & Folder Paths:
Below are the paths used for image data and annotations:
Path to the annotation file
Datasetv1/original_images/_annotations.csvFolder containing the corresponding images
Datasetv1/original_images/
Step 4: Display images with bounding box you have created manually in the previous step.¶
# Path to the CSV file containing image annotations (e.g., bounding boxes, labels)
csv_path = 'Datasetv1/original_images/_annotations.csv'
# Path to the folder where the original images are stored
img_folder = 'Datasetv1/original_images/'
# Load annotations
food_annotations_df = pd.read_csv(csv_path)
# Display the shape of the DataFrame to check the number of rows and columns
food_annotations_df.shape
(558, 8)
# Display the entire DataFrame to inspect the data including any new columns added
food_annotations_df
| filename | width | height | class | xmin | ymin | xmax | ymax | |
|---|---|---|---|---|---|---|---|---|
| 0 | 2909830_jpg.rf.bb9125215f38f22139f72d04f19e693... | 512 | 512 | Apple Pie | 210 | 43 | 397 | 259 |
| 1 | 108743_jpg.rf.260978b4f8ae78f4ebb41f48ef501679... | 512 | 384 | French Fries | 50 | 3 | 442 | 383 |
| 2 | 149278_jpg.rf.86187fd5bd1698133cb7a973c6060449... | 512 | 384 | French Fries | 33 | 0 | 260 | 167 |
| 3 | 2986199_jpg.rf.ac0b99e71100520e6608ef72b12ee27... | 512 | 512 | Apple Pie | 28 | 37 | 291 | 233 |
| 4 | 2934928_jpg.rf.c8f427a0d3e7ba9342fe37276fb15ab... | 512 | 512 | Apple Pie | 9 | 54 | 463 | 465 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 553 | 30292-hotdog_jpg.rf.0390f5521fb9e6e7e3acb2a6a8... | 640 | 640 | Hotdog | 56 | 0 | 640 | 605 |
| 554 | 14043-hotdog_jpg.rf.8336579be067ac62410422f411... | 640 | 640 | Hotdog | 48 | 124 | 404 | 526 |
| 555 | 8006-hotdog_jpg.rf.2b5a43d73a7b80e624c778536e2... | 640 | 640 | Hotdog | 65 | 45 | 623 | 640 |
| 556 | 4345-hotdog_jpg.rf.c81f7d5ae5388487ceea9df4709... | 640 | 640 | Hotdog | 2 | 8 | 640 | 640 |
| 557 | 51643-hotdog_jpg.rf.2eeb177096d2f26e6f38322d53... | 640 | 640 | Hotdog | 3 | 26 | 483 | 607 |
558 rows × 8 columns
# Extract and display all unique food class names from the dataset
food_classes = set(food_annotations_df['class'])
print("List of unique food categories in the dataset:")
for food in sorted(food_classes):
print("-", food)
List of unique food categories in the dataset: - Apple Pie - Chocolate - French Fries - Hotdog - Nachos - Pizza - onion_rings - pancakes - spring_rolls - tacos
# Check for duplicate filenames in the dataset
duplicate_filenames = food_annotations_df[food_annotations_df.duplicated(subset='filename', keep=False)]
print(f"Total duplicate filenames found: {duplicate_filenames['filename'].nunique()}")
print("List of duplicated filenames:")
print(duplicate_filenames['filename'].value_counts())
Total duplicate filenames found: 32 List of duplicated filenames: filename 189678-nachos_jpg.rf.f186725dbfe1bc23e9532408103e1060.jpg 5 3004621_jpg.rf.1a70aad430f7fcc72cc14f91446d4c08.jpg 4 7394_jpg.rf.1838448cb2b3d641b167b9cfbca600cc.jpg 4 91964_jpg.rf.0c917d27d8f80e5c630140d81031d231.jpg 3 11193_jpg.rf.afefd57ffc19ba1eeb51afeee3bf37b4.jpg 3 113781_jpg.rf.de10ec12748947f00d231f8c55aaefb8.jpg 3 1030289_jpg.rf.702c29c39daf844a889cc73917369bdd.jpg 3 2618003_jpg.rf.8d18399346288665532d0826566a79eb.jpg 3 2861144_jpg.rf.a9287e2d7af886a3c026273c3349edba.jpg 3 36081_jpg.rf.bcde8146b7446e659e5d17e94d563635.jpg 2 1058697_jpg.rf.187204c8e93dbe0d20f8676a3f9f7c33.jpg 2 110171_jpg.rf.2e6a197703f7096765d773f023bda859.jpg 2 38615_jpg.rf.edfc43b51bb448e7763ffc9c6c3237c3.jpg 2 45817_jpg.rf.b4f80dfda9bea5836fedec2c7b65e578.jpg 2 62663_jpg.rf.d6e00a3b034bc15f515a5fa056ca1733.jpg 2 58787_jpg.rf.a8acae7e04404aeb8ad1c6a5f8b65434.jpg 2 145012_jpg.rf.4544abe395055b02ccd3e1076038f4ff.jpg 2 33259_jpg.rf.56a5b0558bdb03c426e60f6b5f89b8f4.jpg 2 78171_jpg.rf.4712e20db14395cc19199a4f927ec652.jpg 2 36370_jpg.rf.fc4e83fc5c0a333ddd949da6ac871995.jpg 2 62484_jpg.rf.7a9effc3895e6123dcf647b7f92549f6.jpg 2 92235_jpg.rf.53c19df7b5c9ec2f9d0ffcad8470c394.jpg 2 35235_jpg.rf.32771ba6dfe7c36611eee12e9a4076b6.jpg 2 71645_jpg.rf.7c1651d6851e2f6b318c16b37516c9e6.jpg 2 2983047_jpg.rf.0581d006429c601c3b014a9e4abe4b5c.jpg 2 74527_jpg.rf.a53136bdf4e575d077f34c3c1a41b50a.jpg 2 110385_jpg.rf.ed897b8ba0e20976351d7e0777963d00.jpg 2 80540_jpg.rf.134fc69263831ead08ce2f8a43ac5644.jpg 2 1126_jpg.rf.d3ba4b55b4bf612e7af22ea7ff137788.jpg 2 68177_jpg.rf.4286d561950cc21283c4e2b372092ac1.jpg 2 101450_jpg.rf.eddcc68593aa541ba3d9cce8835094be.jpg 2 95572_jpg.rf.a47685e871481cef6935b90644ff7ba5.jpg 2 Name: count, dtype: int64
# Remove duplicate rows based on filename, keeping the first occurrence
food_annotations_df = food_annotations_df.drop_duplicates(subset='filename', keep='first').reset_index(drop=True)
print(f"Duplicate rows removed. New shape of DataFrame: {food_annotations_df.shape}")
Duplicate rows removed. New shape of DataFrame: (513, 8)
# Show the distribution of samples across different food classes
class_counts = food_annotations_df['class'].value_counts()
print("Food class distribution (class: count):")
for class_name, count in class_counts.items():
print(f"- {class_name}: {count}")
Food class distribution (class: count): - pancakes: 57 - spring_rolls: 53 - tacos: 52 - French Fries: 51 - onion_rings: 51 - Pizza: 50 - Nachos: 50 - Chocolate: 50 - Hotdog: 50 - Apple Pie: 49
# Display summary statistics about the dataset
total_annotations = len(food_annotations_df)
unique_images = food_annotations_df['filename'].nunique()
unique_classes = food_annotations_df['class'].nunique()
print("Dataset Summary:")
print(f"- Total annotations : {total_annotations}")
print(f"- Unique image files : {unique_images}")
print(f"- Number of food classes : {unique_classes}")
Dataset Summary: - Total annotations : 513 - Unique image files : 513 - Number of food classes : 10
Observation :
The dictionary maps each food class name to a unique integer index from 0 to 9, following the alphabetical order of class names.
Class names like 'Apple Pie' and 'Chocolate' come first as they are alphabetically earlier.
The mapping is case-sensitive and sorted lexicographically, so lowercase names like 'onion_rings', 'pancakes', 'spring_rolls', and 'tacos' appear after the capitalized ones due to ASCII sorting rules.
This consistent and reproducible mapping is essential for:
Encoding labels during model training.
Decoding predictions back to readable class names.
With 10 classes total, this dictionary covers all classes with unique indices and no duplicates or missing entries.
# Function to display bounding boxes for specified classes
def show_bboxes(df, n=5, classes_to_show=None):
# Filter by class if specified
if classes_to_show:
#classes_to_show = [cls.lower().replace(" ", "_") for cls in classes_to_show]
#df['class'] = df['class'].str.lower()
filtered_df = df[df['class'].isin(classes_to_show)]
if filtered_df.empty:
print(f"⚠️ No images found for classes: {classes_to_show}")
return
else:
filtered_df = df
img_files = filtered_df['filename'].unique()
total = min(n, len(img_files))
# Prepare grid layout (e.g., 5 images in 1 row)
fig, axes = plt.subplots(1, total, figsize=(5 * total, 5))
# If only one image, axes is not iterable
if total == 1:
axes = [axes]
for idx in range(total):
img_file = img_files[idx]
img_path = os.path.join(img_folder, img_file)
if not os.path.exists(img_path):
print(f"❌ Image not found: {img_path}")
continue
img = cv2.imread(img_path)
if img is None:
print(f"⚠️ Unable to read image: {img_file}")
continue
img_rgb = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
ax = axes[idx]
ax.imshow(img_rgb)
# Draw all boxes for the current image
for _, row in filtered_df[filtered_df['filename'] == img_file].iterrows():
x_min, y_min, x_max, y_max = int(row['xmin']), int(row['ymin']), int(row['xmax']), int(row['ymax'])
label = row['class']
rect = patches.Rectangle((x_min, y_min), x_max - x_min, y_max - y_min,
linewidth=2, edgecolor='red', facecolor='none')
ax.add_patch(rect)
ax.text(x_min, y_min - 5, label, color='red', fontsize=10, backgroundcolor='white')
ax.axis('off')
ax.set_title(f"{img_file}", fontsize=10)
plt.tight_layout()
plt.show()
# Show 5 images with boxes only for 'Apple Pie'
show_bboxes(food_annotations_df, n=5, classes_to_show=['Apple Pie'])
# Show 5 images with boxes only for 'French Fries'
show_bboxes(food_annotations_df, n=5, classes_to_show=['French Fries'])
# Show 5 images with boxes only for 'pancakes'
show_bboxes(food_annotations_df, n=5, classes_to_show=['pancakes'])
# Show 5 images with boxes only for 'tacos'
show_bboxes(food_annotations_df, n=5, classes_to_show=['tacos'])
# Show 5 images with boxes only for 'Pizza'
show_bboxes(food_annotations_df, n=5, classes_to_show=['Pizza'])
# Show 5 images with boxes only for 'Nachos'
show_bboxes(food_annotations_df, n=5, classes_to_show=['Nachos'])
# Show 5 images with boxes only for 'onion_rings'
show_bboxes(food_annotations_df, n=5, classes_to_show=['onion_rings'])
# Show 5 images with boxes only for 'spring_rolls'
show_bboxes(food_annotations_df, n=5, classes_to_show=['spring_rolls'])
# Show 5 images with boxes only for 'hot_dog'
show_bboxes(food_annotations_df, n=5, classes_to_show=['Hotdog'])
# Show 5 images with boxes only for 'chocolate_cake'
show_bboxes(food_annotations_df, n=5, classes_to_show=['Chocolate'])
Step 5: Design, train and test basic CNN models to classify the flood.¶
Utilities Functions¶
from tensorflow.keras.callbacks import ModelCheckpoint, ReduceLROnPlateau, EarlyStopping
def train_model(model, X_train, y_train, X_val, y_val, epochs=50, batch_size=32, filepath='model_best.weights.h5'):
checkpointer = ModelCheckpoint(
filepath=filepath,
verbose=1,
save_best_only=True,
save_weights_only=True
)
earlystopping = EarlyStopping(
monitor='val_loss',
min_delta=0.01,
patience=20,
mode='auto'
)
reduceLR = ReduceLROnPlateau(
monitor='val_loss',
factor=0.5,
patience=10,
mode='auto'
)
history = model.fit(
X_train, y_train,
validation_data=(X_val, y_val),
epochs=epochs,
batch_size=batch_size,
callbacks=[checkpointer, reduceLR, earlystopping],
verbose=1
)
return history
import matplotlib.pyplot as plt
import numpy as np
def plot_training_history(history, model, X_test, y_test, model_name="Model"):
"""
Plot training and validation metrics from model history,
and evaluate accuracy/loss on test data.
Args:
history: History object returned from model.fit()
model: Trained Keras model
X_test: Test feature set
y_test: Test labels
model_name: Name of the model for the plot title
"""
# Create figure with two subplots
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(15, 5))
# Plot accuracy
ax1.plot(history.history['accuracy'], label='Training Accuracy')
ax1.plot(history.history['val_accuracy'], label='Validation Accuracy')
ax1.set_title(f'{model_name} - Accuracy')
ax1.set_xlabel('Epoch')
ax1.set_ylabel('Accuracy')
ax1.legend()
ax1.grid(True)
# Plot loss
ax2.plot(history.history['loss'], label='Training Loss')
ax2.plot(history.history['val_loss'], label='Validation Loss')
ax2.set_title(f'{model_name} - Loss')
ax2.set_xlabel('Epoch')
ax2.set_ylabel('Loss')
ax2.legend()
ax2.grid(True)
plt.tight_layout()
plt.show()
# Print final training/validation metrics
print(f"\n🔍 Final Epoch Metrics:")
print(f"📈 Training Accuracy : {history.history['accuracy'][-1]:.2f}")
print(f"📉 Training Loss : {history.history['loss'][-1]:.2f}")
print(f"📈 Validation Accuracy : {history.history['val_accuracy'][-1]:.2f}")
print(f"📉 Validation Loss : {history.history['val_loss'][-1]:.4f}")
# Evaluate on test data
test_loss, test_accuracy = model.evaluate(X_test, y_test, verbose=0)
print(f"\n🧪 Test Accuracy : {test_accuracy:.2f}")
print(f"🧪 Test Loss : {test_loss:.2f}")
import numpy as np
from sklearn.preprocessing import LabelEncoder
from sklearn.metrics import classification_report, confusion_matrix
import seaborn as sns
import matplotlib.pyplot as plt
from pandas import DataFrame
def evaluate_classification_model(model, X_test, y_test, y_train=None):
"""
Evaluate a classification model: prints classification report and shows confusion matrix.
Parameters:
- model: Trained Keras model
- X_test: Test features
- y_test: True labels (can be one-hot or class indices)
- y_train: (Optional) Training labels to ensure LabelEncoder covers all classes
"""
# Ensure X_test is a NumPy array with dtype float32
X_test = np.array(X_test).astype(np.float32) # 🔧 Fix applied here
# Predict class probabilities
y_pred_probs = model.predict(X_test)
# Get predicted class indices
y_pred_class = np.argmax(y_pred_probs, axis=1)
# Convert y_test to class indices if one-hot encoded
if y_test.ndim > 1 and y_test.shape[1] > 1:
y_test_class = np.argmax(y_test, axis=1)
else:
y_test_class = y_test.ravel().astype(int)
# Fit LabelEncoder on combined labels if y_train is provided
if y_train is not None:
all_labels = np.concatenate([y_train.ravel(), y_test_class])
else:
all_labels = y_test_class
label_encoder = LabelEncoder()
label_encoder.fit(all_labels)
# Decode predicted and true labels to class names
y_test_labels = label_encoder.inverse_transform(y_test_class.astype(int))
y_pred_labels = label_encoder.inverse_transform(y_pred_class.astype(int))
class_names = sorted(food_annotations_df['class'].unique())
# Print classification report
print("Classification Report:")
print(classification_report(y_test_labels, y_pred_labels, target_names=class_names, zero_division=0))
# Confusion Matrix
conf_mat = confusion_matrix(y_test_class, y_pred_class, labels=label_encoder.classes_)
# Plot Confusion Matrix
plt.figure(figsize=(10, 8))
sns.heatmap(conf_mat, annot=True, fmt='d', xticklabels=class_names, yticklabels=class_names, cmap='Blues')
plt.xlabel('Predicted')
plt.ylabel('Actual')
plt.title('Confusion Matrix')
plt.show()
import random
import matplotlib.pyplot as plt
import numpy as np
def plot_random_predictions(X_test, y_test, class_names, model, num_samples=5):
"""
Plots random test samples with predicted and actual labels, showing 5 images per row max.
Correct predictions are shown in green, incorrect in red.
Args:
X_test (np.array): Test images, shape (N, H, W, C)
y_test (np.array): One-hot encoded labels, shape (N, num_classes)
class_names (list): List of class names corresponding to label indices
model (keras.Model): Trained classification model
num_samples (int): Number of random samples to display (default: 5)
"""
indices = random.sample(range(len(X_test)), num_samples)
cols = 5
rows = (num_samples + cols - 1) // cols # Ceiling division to get rows
plt.figure(figsize=(cols * 3, rows * 3)) # Adjust figure size
for i, idx in enumerate(indices):
img = X_test[idx]
true_label = np.argmax(y_test[idx])
pred_label = np.argmax(model.predict(np.expand_dims(img, axis=0), verbose=0))
color = 'green' if pred_label == true_label else 'red'
title_text = f"Pred: {class_names[pred_label]}\nActual: {class_names[true_label]}"
plt.subplot(rows, cols, i + 1)
plt.imshow(img)
plt.title(title_text, color=color, fontsize=10)
plt.axis('off')
plt.suptitle("Model Predictions on Random Test Images", fontsize=16)
plt.tight_layout()
plt.subplots_adjust(top=0.85) # Make space for suptitle
plt.show()
Step 5.1.1:Preprocess Data¶
# Import train_test_split to split data into training and testing sets with optional stratification
from sklearn.model_selection import train_test_split
# Import to_categorical to convert integer labels into one-hot encoded format for classification models
from tensorflow.keras.utils import to_categorical
# Import img_to_array to convert PIL Images or numpy arrays to proper array format for model input
from tensorflow.keras.preprocessing.image import img_to_array
# Extract all unique food class names from the 'class' column in the annotations DataFrame,
# then sort them alphabetically to create a consistent ordered list of class names
class_names = sorted(food_annotations_df['class'].unique())
class_names
['Apple Pie', 'Chocolate', 'French Fries', 'Hotdog', 'Nachos', 'Pizza', 'onion_rings', 'pancakes', 'spring_rolls', 'tacos']
# Create a dictionary mapping each class name to a unique integer index,
# where indices correspond to the position of the class name in the sorted list
class_to_idx = {cls: idx for idx, cls in enumerate(class_names)}
class_to_idx
{'Apple Pie': 0,
'Chocolate': 1,
'French Fries': 2,
'Hotdog': 3,
'Nachos': 4,
'Pizza': 5,
'onion_rings': 6,
'pancakes': 7,
'spring_rolls': 8,
'tacos': 9}
# Encode class labels
food_annotations_df['label'] = food_annotations_df['class'].map(class_to_idx)
food_annotations_df
| filename | width | height | class | xmin | ymin | xmax | ymax | label | |
|---|---|---|---|---|---|---|---|---|---|
| 0 | 2909830_jpg.rf.bb9125215f38f22139f72d04f19e693... | 512 | 512 | Apple Pie | 210 | 43 | 397 | 259 | 0 |
| 1 | 108743_jpg.rf.260978b4f8ae78f4ebb41f48ef501679... | 512 | 384 | French Fries | 50 | 3 | 442 | 383 | 2 |
| 2 | 149278_jpg.rf.86187fd5bd1698133cb7a973c6060449... | 512 | 384 | French Fries | 33 | 0 | 260 | 167 | 2 |
| 3 | 2986199_jpg.rf.ac0b99e71100520e6608ef72b12ee27... | 512 | 512 | Apple Pie | 28 | 37 | 291 | 233 | 0 |
| 4 | 2934928_jpg.rf.c8f427a0d3e7ba9342fe37276fb15ab... | 512 | 512 | Apple Pie | 9 | 54 | 463 | 465 | 0 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 508 | 30292-hotdog_jpg.rf.0390f5521fb9e6e7e3acb2a6a8... | 640 | 640 | Hotdog | 56 | 0 | 640 | 605 | 3 |
| 509 | 14043-hotdog_jpg.rf.8336579be067ac62410422f411... | 640 | 640 | Hotdog | 48 | 124 | 404 | 526 | 3 |
| 510 | 8006-hotdog_jpg.rf.2b5a43d73a7b80e624c778536e2... | 640 | 640 | Hotdog | 65 | 45 | 623 | 640 | 3 |
| 511 | 4345-hotdog_jpg.rf.c81f7d5ae5388487ceea9df4709... | 640 | 640 | Hotdog | 2 | 8 | 640 | 640 | 3 |
| 512 | 51643-hotdog_jpg.rf.2eeb177096d2f26e6f38322d53... | 640 | 640 | Hotdog | 3 | 26 | 483 | 607 | 3 |
513 rows × 9 columns
- A new column 'label' is added to the food_annotations_df DataFrame, where each food class name in the 'class' column is replaced with its corresponding integer index from the class_to_idx dictionary. This numeric encoding is necessary for training machine learning models that require labels as integers.
# --- Load images and corresponding labels ---
img_folder = 'Datasetv1/original_images/'
images = []
labels = []
for _, row in food_annotations_df.iterrows():
img_path = os.path.join(img_folder, row['filename'])
img = cv2.imread(img_path)
if img is not None:
img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB) # Convert from BGR to RGB
img = cv2.resize(img, (128, 128)) # Resize to 128x128
#img = img_to_array(img) / 255.0 # Normalize to [0, 1]
images.append(img)
labels.append(row['class'])
# --- Convert lists of images and labels to NumPy arrays ---
X = np.array(images)
y = np.array(labels)
# Display the shapes of the feature and label arrays
print(f"Shape of image data (X): {X.shape}")
print(f"Shape of label data (y): {y.shape}")
Shape of image data (X): (513, 128, 128, 3) Shape of label data (y): (513,)
y
array(['Apple Pie', 'French Fries', 'French Fries', 'Apple Pie',
'Apple Pie', 'French Fries', 'Apple Pie', 'Apple Pie',
'French Fries', 'Apple Pie', 'French Fries', 'French Fries',
'Apple Pie', 'Apple Pie', 'Apple Pie', 'French Fries', 'Apple Pie',
'French Fries', 'French Fries', 'French Fries', 'French Fries',
'Apple Pie', 'French Fries', 'French Fries', 'French Fries',
'French Fries', 'Apple Pie', 'Apple Pie', 'Apple Pie',
'French Fries', 'French Fries', 'Apple Pie', 'Apple Pie',
'Apple Pie', 'Apple Pie', 'French Fries', 'Apple Pie',
'French Fries', 'Apple Pie', 'French Fries', 'Apple Pie',
'French Fries', 'French Fries', 'French Fries', 'French Fries',
'French Fries', 'French Fries', 'Apple Pie', 'Apple Pie',
'Apple Pie', 'French Fries', 'Apple Pie', 'French Fries',
'French Fries', 'French Fries', 'French Fries', 'Apple Pie',
'Apple Pie', 'Apple Pie', 'Apple Pie', 'Apple Pie', 'Apple Pie',
'French Fries', 'French Fries', 'French Fries', 'Apple Pie',
'French Fries', 'Apple Pie', 'Apple Pie', 'French Fries',
'French Fries', 'Apple Pie', 'Apple Pie', 'French Fries',
'French Fries', 'Apple Pie', 'Apple Pie', 'Apple Pie',
'French Fries', 'French Fries', 'French Fries', 'Apple Pie',
'Apple Pie', 'French Fries', 'Apple Pie', 'French Fries',
'French Fries', 'French Fries', 'French Fries', 'Apple Pie',
'French Fries', 'Apple Pie', 'French Fries', 'Apple Pie',
'French Fries', 'Apple Pie', 'Apple Pie', 'Apple Pie',
'French Fries', 'Apple Pie', 'pancakes', 'pancakes', 'pancakes',
'pancakes', 'pancakes', 'pancakes', 'pancakes', 'pancakes',
'pancakes', 'pancakes', 'pancakes', 'pancakes', 'pancakes',
'pancakes', 'pancakes', 'pancakes', 'pancakes', 'pancakes',
'pancakes', 'pancakes', 'pancakes', 'pancakes', 'pancakes',
'pancakes', 'pancakes', 'pancakes', 'pancakes', 'pancakes',
'pancakes', 'pancakes', 'pancakes', 'pancakes', 'pancakes',
'pancakes', 'pancakes', 'pancakes', 'pancakes', 'pancakes',
'pancakes', 'pancakes', 'pancakes', 'pancakes', 'pancakes',
'pancakes', 'pancakes', 'pancakes', 'pancakes', 'pancakes',
'pancakes', 'pancakes', 'pancakes', 'pancakes', 'pancakes',
'pancakes', 'pancakes', 'pancakes', 'pancakes', 'tacos', 'tacos',
'tacos', 'tacos', 'tacos', 'tacos', 'tacos', 'tacos', 'tacos',
'tacos', 'tacos', 'tacos', 'tacos', 'tacos', 'tacos', 'tacos',
'tacos', 'tacos', 'tacos', 'tacos', 'tacos', 'tacos', 'tacos',
'tacos', 'tacos', 'tacos', 'tacos', 'tacos', 'tacos', 'tacos',
'tacos', 'tacos', 'tacos', 'tacos', 'tacos', 'tacos', 'tacos',
'tacos', 'tacos', 'tacos', 'tacos', 'tacos', 'tacos', 'tacos',
'tacos', 'tacos', 'tacos', 'tacos', 'tacos', 'tacos', 'tacos',
'tacos', 'Pizza', 'Pizza', 'Pizza', 'Pizza', 'Pizza', 'Pizza',
'Pizza', 'Pizza', 'Pizza', 'Pizza', 'Pizza', 'Pizza', 'Pizza',
'Pizza', 'Pizza', 'Pizza', 'Pizza', 'Pizza', 'Pizza', 'Pizza',
'Pizza', 'Pizza', 'Pizza', 'Pizza', 'Pizza', 'Pizza', 'Pizza',
'Pizza', 'Pizza', 'Pizza', 'Pizza', 'Pizza', 'Pizza', 'Pizza',
'Pizza', 'Pizza', 'Pizza', 'Pizza', 'Pizza', 'Pizza', 'Pizza',
'Pizza', 'Pizza', 'Pizza', 'Pizza', 'Pizza', 'Pizza', 'Pizza',
'Pizza', 'Pizza', 'Nachos', 'Nachos', 'Nachos', 'Nachos', 'Nachos',
'Nachos', 'Nachos', 'Nachos', 'Nachos', 'Nachos', 'Nachos',
'Nachos', 'Nachos', 'Nachos', 'Nachos', 'Nachos', 'Nachos',
'Nachos', 'Nachos', 'Nachos', 'Nachos', 'Nachos', 'Nachos',
'Nachos', 'Nachos', 'Nachos', 'Nachos', 'Nachos', 'Nachos',
'Nachos', 'Nachos', 'Nachos', 'Nachos', 'Nachos', 'Nachos',
'Nachos', 'Nachos', 'Nachos', 'Nachos', 'Nachos', 'Nachos',
'Nachos', 'Nachos', 'Nachos', 'Nachos', 'Nachos', 'Nachos',
'Nachos', 'Nachos', 'Nachos', 'onion_rings', 'onion_rings',
'onion_rings', 'onion_rings', 'onion_rings', 'onion_rings',
'onion_rings', 'onion_rings', 'onion_rings', 'onion_rings',
'onion_rings', 'onion_rings', 'onion_rings', 'onion_rings',
'onion_rings', 'onion_rings', 'onion_rings', 'onion_rings',
'onion_rings', 'onion_rings', 'onion_rings', 'onion_rings',
'onion_rings', 'onion_rings', 'onion_rings', 'onion_rings',
'onion_rings', 'onion_rings', 'onion_rings', 'onion_rings',
'onion_rings', 'onion_rings', 'onion_rings', 'onion_rings',
'onion_rings', 'onion_rings', 'onion_rings', 'onion_rings',
'onion_rings', 'onion_rings', 'onion_rings', 'onion_rings',
'onion_rings', 'onion_rings', 'onion_rings', 'onion_rings',
'onion_rings', 'onion_rings', 'onion_rings', 'onion_rings',
'onion_rings', 'spring_rolls', 'spring_rolls', 'spring_rolls',
'spring_rolls', 'spring_rolls', 'spring_rolls', 'spring_rolls',
'spring_rolls', 'spring_rolls', 'spring_rolls', 'spring_rolls',
'spring_rolls', 'spring_rolls', 'spring_rolls', 'spring_rolls',
'spring_rolls', 'spring_rolls', 'spring_rolls', 'spring_rolls',
'spring_rolls', 'spring_rolls', 'spring_rolls', 'spring_rolls',
'spring_rolls', 'spring_rolls', 'spring_rolls', 'spring_rolls',
'spring_rolls', 'spring_rolls', 'spring_rolls', 'spring_rolls',
'spring_rolls', 'spring_rolls', 'spring_rolls', 'spring_rolls',
'spring_rolls', 'spring_rolls', 'spring_rolls', 'spring_rolls',
'spring_rolls', 'spring_rolls', 'spring_rolls', 'spring_rolls',
'spring_rolls', 'spring_rolls', 'spring_rolls', 'spring_rolls',
'spring_rolls', 'spring_rolls', 'spring_rolls', 'spring_rolls',
'spring_rolls', 'spring_rolls', 'Chocolate', 'Chocolate',
'Chocolate', 'Chocolate', 'Chocolate', 'Chocolate', 'Chocolate',
'Chocolate', 'Chocolate', 'Chocolate', 'Chocolate', 'Chocolate',
'Chocolate', 'Chocolate', 'Chocolate', 'Chocolate', 'Chocolate',
'Chocolate', 'Chocolate', 'Chocolate', 'Chocolate', 'Chocolate',
'Chocolate', 'Chocolate', 'Chocolate', 'Chocolate', 'Chocolate',
'Chocolate', 'Chocolate', 'Chocolate', 'Chocolate', 'Chocolate',
'Chocolate', 'Chocolate', 'Chocolate', 'Chocolate', 'Chocolate',
'Chocolate', 'Chocolate', 'Chocolate', 'Chocolate', 'Chocolate',
'Chocolate', 'Chocolate', 'Chocolate', 'Chocolate', 'Chocolate',
'Chocolate', 'Chocolate', 'Chocolate', 'Hotdog', 'Hotdog',
'Hotdog', 'Hotdog', 'Hotdog', 'Hotdog', 'Hotdog', 'Hotdog',
'Hotdog', 'Hotdog', 'Hotdog', 'Hotdog', 'Hotdog', 'Hotdog',
'Hotdog', 'Hotdog', 'Hotdog', 'Hotdog', 'Hotdog', 'Hotdog',
'Hotdog', 'Hotdog', 'Hotdog', 'Hotdog', 'Hotdog', 'Hotdog',
'Hotdog', 'Hotdog', 'Hotdog', 'Hotdog', 'Hotdog', 'Hotdog',
'Hotdog', 'Hotdog', 'Hotdog', 'Hotdog', 'Hotdog', 'Hotdog',
'Hotdog', 'Hotdog', 'Hotdog', 'Hotdog', 'Hotdog', 'Hotdog',
'Hotdog', 'Hotdog', 'Hotdog', 'Hotdog', 'Hotdog', 'Hotdog'],
dtype='<U12')
# --- Convert to NumPy arrays ---
#X = np.array(images)
#y = to_categorical(labels, num_classes=len(class_names)) # One-hot encode the labels
- Verifying an image and its label after splitting into X (images) and y (target labels), and your y contains target labels
import matplotlib.pyplot as plt
import random
# Number of images to display
num_display = 5
# Randomly pick image indices
indices = random.sample(range(len(images)), num_display)
plt.figure(figsize=(15, 5))
for i, idx in enumerate(indices):
plt.subplot(1, num_display, i + 1)
plt.imshow(images[idx])
class_name = y[idx] # Get class name using label index
plt.title(f"Label: {class_name}")
plt.axis('off')
plt.suptitle("Sample Images with Labels", fontsize=16)
plt.tight_layout()
plt.show()
# Encode labels to integers first
from sklearn.preprocessing import LabelEncoder
label_encoder = LabelEncoder()
y_encoded = label_encoder.fit_transform(y)
# print summary
print("Labels encoded successfully.")
print(f"Number of classes: {len(label_encoder.classes_)}")
Labels encoded successfully. Number of classes: 10
# Get all unique class labels (original) and their encoded values
class_names = label_encoder.classes_
print("Label Mapping (Original Label → Encoded Index):")
for idx, label in enumerate(class_names):
print(f"{idx}: {label}")
Label Mapping (Original Label → Encoded Index): 0: Apple Pie 1: Chocolate 2: French Fries 3: Hotdog 4: Nachos 5: Pizza 6: onion_rings 7: pancakes 8: spring_rolls 9: tacos
Train Test Split:¶
# Split into train and temp sets (80% train, 20% temp), with stratification
X_train, X_temp, y_train_encoded, y_temp_encoded = train_test_split(
X, y_encoded, test_size=0.2, random_state=42, stratify=y_encoded
)
# Split temp into validation and test (each 10% of total), with stratification
X_valid, X_test, y_valid_encoded, y_test_encoded = train_test_split(
X_temp, y_temp_encoded, test_size=0.5, random_state=42, stratify=y_temp_encoded
)
# One-hot encode the labels
y_train = to_categorical(y_train_encoded)
y_valid = to_categorical(y_valid_encoded)
y_test = to_categorical(y_test_encoded)
# Print the shapes of the splits
print("Dataset Split Summary:")
print(f"Train set → X: {X_train.shape}, y: {y_train.shape}")
print(f"Validation → X: {X_valid.shape}, y: {y_valid.shape}")
print(f"Test set → X: {X_test.shape}, y: {y_test.shape}")
Dataset Split Summary: Train set → X: (410, 128, 128, 3), y: (410, 10) Validation → X: (51, 128, 128, 3), y: (51, 10) Test set → X: (52, 128, 128, 3), y: (52, 10)
Verify image-label mapping after splitting¶
print(label_encoder.classes_)
['Apple Pie' 'Chocolate' 'French Fries' 'Hotdog' 'Nachos' 'Pizza' 'onion_rings' 'pancakes' 'spring_rolls' 'tacos']
np.argmax(y_train[1])
7
# ------------------------------
# Display a Random Training Image with its Label
# ------------------------------
def show_samples(X, y, class_names, num_samples=5):
plt.figure(figsize=(15, 5))
for i in range(num_samples):
img = X[i]
label_idx = np.argmax(y[i]) # Convert one-hot label to index
plt.subplot(1, num_samples, i + 1)
plt.imshow(img)
plt.title(f"Label: {class_names[label_idx]}")
plt.axis('off')
plt.suptitle("Sample Training Images with Labels", fontsize=16)
plt.tight_layout()
plt.show()
# Call the function
show_samples(X_train, y_train, label_encoder.classes_)
# Check lengths
print(len(X_train), len(y_train)) # Should be equal
print(len(X_test), len(y_test)) # Should be equal
410 410 52 52
# Check label distribution consistency
import numpy as np
import collections
# Convert one-hot encoded labels to class indices
y_train_labels = np.argmax(y_train, axis=1)
y_test_labels = np.argmax(y_test, axis=1)
# Count label distribution
print("Train label distribution:", collections.Counter(y_train_labels))
print("Test label distribution:", collections.Counter(y_test_labels))
Train label distribution: Counter({7: 45, 8: 42, 9: 42, 6: 41, 2: 41, 4: 40, 1: 40, 3: 40, 5: 40, 0: 39})
Test label distribution: Counter({7: 6, 8: 6, 4: 5, 6: 5, 2: 5, 3: 5, 1: 5, 9: 5, 0: 5, 5: 5})
import matplotlib.pyplot as plt
# Count label distribution
train_counts = collections.Counter(y_train_labels)
test_counts = collections.Counter(y_test_labels)
# Sort labels for consistent plotting
labels = sorted(train_counts.keys())
# Get counts in sorted order
train_values = [train_counts[label] for label in labels]
test_values = [test_counts[label] for label in labels]
# Plotting
x = np.arange(len(labels))
width = 0.35
plt.figure(figsize=(12, 6))
plt.bar(x - width/2, train_values, width, label='Train', color='skyblue')
plt.bar(x + width/2, test_values, width, label='Test', color='salmon')
plt.xlabel('Class Label')
plt.ylabel('Number of Samples')
plt.title('Train vs Test Label Distribution')
plt.xticks(x, labels)
plt.legend()
plt.tight_layout()
plt.grid(axis='y', linestyle='--', alpha=0.7)
plt.show()
Observation :
- Dataset is fairly balanced, which is beneficial for model training, as it reduces the risk of bias toward any particular class.
# ------------------------------
# Display a Random Train Image with its Label
# ------------------------------
import random
import numpy as np
import matplotlib.pyplot as plt
# Pick a random index from the training set
idx = random.randint(0, len(X_train) - 1)
# Convert one-hot encoded label at that index to an integer class index
label_idx = np.argmax(y_train[idx])
# Print the label index to verify which class the image belongs to
print("Label index:", label_idx)
# Display the image at the randomly selected index
plt.imshow(X_train[idx])
# Set the title of the plot to the corresponding class name
plt.title(class_names[label_idx])
# Remove axis ticks and labels for a cleaner display
plt.axis('off')
# Show the image plot
plt.show()
Label index: 2
# ------------------------------
# Display a Random Test Image with its Label
# ------------------------------
import random
import numpy as np
import matplotlib.pyplot as plt
# Pick a random index from the training set
idx = random.randint(0, len(X_test) - 1)
# Convert one-hot encoded label at that index to an integer class index
label_idx = np.argmax(y_test[idx])
# Print the label index to verify which class the image belongs to
print("Label index:", label_idx)
# Display the image at the randomly selected index
plt.imshow(X_test[idx])
# Set the title of the plot to the corresponding class name
plt.title(class_names[label_idx])
# Remove axis ticks and labels for a cleaner display
plt.axis('off')
# Show the image plot
plt.show()
Label index: 6
Step 5.1.2: Build a Basic CNN¶
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Conv2D, MaxPooling2D, Flatten, Dense, Dropout
from tensorflow.keras.layers import Input
from tensorflow.keras.optimizers import Adam
# Define a simple CNN model for multi-class classification
basic_cnn_model_1 = Sequential([
Input(shape=(128, 128, 3)), # Input layer specifying image size and channels (RGB)
# First convolution + pooling block
Conv2D(32, (3, 3), activation='relu'),
MaxPooling2D((2, 2)),
# Second convolution + pooling block
Conv2D(64, (3, 3), activation='relu'),
MaxPooling2D((2, 2)),
# Third convolution + pooling block
Conv2D(128, (3, 3), activation='relu'),
MaxPooling2D((2, 2)),
# Flatten feature maps to a 1D vector for dense layers
Flatten(),
# Fully connected layer with 128 neurons
Dense(128, activation='relu'),
Dropout(0.5), # Dropout for regularization to prevent overfitting
# Output layer with number of classes and softmax activation
Dense(len(class_names), activation='softmax')
])
# Compile the model with Adam optimizer, categorical crossentropy loss for multi-class, and accuracy metric
basic_cnn_model_1.compile(
optimizer=Adam(learning_rate=0.001),
loss='categorical_crossentropy',
metrics=['accuracy']
)
# Print model architecture summary
basic_cnn_model_1.summary()
Model: "sequential_67"
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓ ┃ Layer (type) ┃ Output Shape ┃ Param # ┃ ┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩ │ conv2d_260 (Conv2D) │ (None, 126, 126, 32) │ 896 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ max_pooling2d_253 │ (None, 63, 63, 32) │ 0 │ │ (MaxPooling2D) │ │ │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ conv2d_261 (Conv2D) │ (None, 61, 61, 64) │ 18,496 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ max_pooling2d_254 │ (None, 30, 30, 64) │ 0 │ │ (MaxPooling2D) │ │ │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ conv2d_262 (Conv2D) │ (None, 28, 28, 128) │ 73,856 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ max_pooling2d_255 │ (None, 14, 14, 128) │ 0 │ │ (MaxPooling2D) │ │ │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ flatten_50 (Flatten) │ (None, 25088) │ 0 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ dense_135 (Dense) │ (None, 128) │ 3,211,392 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ dropout_182 (Dropout) │ (None, 128) │ 0 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ dense_136 (Dense) │ (None, 10) │ 1,290 │ └─────────────────────────────────┴────────────────────────┴───────────────┘
Total params: 3,305,930 (12.61 MB)
Trainable params: 3,305,930 (12.61 MB)
Non-trainable params: 0 (0.00 B)
5.1.3: Train the Model¶
# Train the model
# history = basic_cnn_model_1.fit(
# X_train, # Training input features (e.g., images, text sequences)
# y_train,
# validation_data=(X_val, y_val), # Corresponding training labels
# epochs=10, # Number of times the model will iterate over the entire training data
# batch_size=16, # Number of samples the model processes before updating weights
# #validation_split=0.2 # Fraction of training data (20%) used for validation (i.e., 80% train, 20% validate)
# )
history = train_model(basic_cnn_model_1, X_train, y_train, X_valid, y_valid, epochs=20, batch_size=16)
Epoch 1/20 25/26 ━━━━━━━━━━━━━━━━━━━━ 0s 57ms/step - accuracy: 0.2579 - loss: 2.0964 Epoch 1: val_loss improved from inf to 2.23658, saving model to model_best.weights.h5 26/26 ━━━━━━━━━━━━━━━━━━━━ 2s 63ms/step - accuracy: 0.2568 - loss: 2.0971 - val_accuracy: 0.1765 - val_loss: 2.2366 - learning_rate: 2.5000e-04 Epoch 2/20 26/26 ━━━━━━━━━━━━━━━━━━━━ 0s 50ms/step - accuracy: 0.3203 - loss: 1.9839 Epoch 2: val_loss did not improve from 2.23658 26/26 ━━━━━━━━━━━━━━━━━━━━ 1s 53ms/step - accuracy: 0.3204 - loss: 1.9822 - val_accuracy: 0.1569 - val_loss: 2.2678 - learning_rate: 2.5000e-04 Epoch 3/20 25/26 ━━━━━━━━━━━━━━━━━━━━ 0s 51ms/step - accuracy: 0.4405 - loss: 1.7554 Epoch 3: val_loss did not improve from 2.23658 26/26 ━━━━━━━━━━━━━━━━━━━━ 1s 53ms/step - accuracy: 0.4404 - loss: 1.7515 - val_accuracy: 0.1569 - val_loss: 2.3056 - learning_rate: 2.5000e-04 Epoch 4/20 25/26 ━━━━━━━━━━━━━━━━━━━━ 0s 54ms/step - accuracy: 0.4877 - loss: 1.5367 Epoch 4: val_loss did not improve from 2.23658 26/26 ━━━━━━━━━━━━━━━━━━━━ 1s 56ms/step - accuracy: 0.4904 - loss: 1.5339 - val_accuracy: 0.1373 - val_loss: 2.3733 - learning_rate: 2.5000e-04 Epoch 5/20 26/26 ━━━━━━━━━━━━━━━━━━━━ 0s 48ms/step - accuracy: 0.6255 - loss: 1.2175 Epoch 5: val_loss did not improve from 2.23658 26/26 ━━━━━━━━━━━━━━━━━━━━ 1s 51ms/step - accuracy: 0.6248 - loss: 1.2186 - val_accuracy: 0.1569 - val_loss: 2.4882 - learning_rate: 2.5000e-04 Epoch 6/20 26/26 ━━━━━━━━━━━━━━━━━━━━ 0s 48ms/step - accuracy: 0.6634 - loss: 1.0593 Epoch 6: val_loss did not improve from 2.23658 26/26 ━━━━━━━━━━━━━━━━━━━━ 1s 51ms/step - accuracy: 0.6643 - loss: 1.0581 - val_accuracy: 0.2353 - val_loss: 2.7572 - learning_rate: 2.5000e-04 Epoch 7/20 26/26 ━━━━━━━━━━━━━━━━━━━━ 0s 48ms/step - accuracy: 0.7401 - loss: 0.8608 Epoch 7: val_loss did not improve from 2.23658 26/26 ━━━━━━━━━━━━━━━━━━━━ 1s 50ms/step - accuracy: 0.7395 - loss: 0.8611 - val_accuracy: 0.1961 - val_loss: 3.0568 - learning_rate: 2.5000e-04 Epoch 8/20 25/26 ━━━━━━━━━━━━━━━━━━━━ 0s 49ms/step - accuracy: 0.8043 - loss: 0.7184 Epoch 8: val_loss did not improve from 2.23658 26/26 ━━━━━━━━━━━━━━━━━━━━ 1s 51ms/step - accuracy: 0.8060 - loss: 0.7156 - val_accuracy: 0.1569 - val_loss: 3.6817 - learning_rate: 2.5000e-04 Epoch 9/20 26/26 ━━━━━━━━━━━━━━━━━━━━ 0s 48ms/step - accuracy: 0.8389 - loss: 0.6055 Epoch 9: val_loss did not improve from 2.23658 26/26 ━━━━━━━━━━━━━━━━━━━━ 1s 51ms/step - accuracy: 0.8388 - loss: 0.6054 - val_accuracy: 0.1373 - val_loss: 3.5692 - learning_rate: 2.5000e-04 Epoch 10/20 25/26 ━━━━━━━━━━━━━━━━━━━━ 0s 53ms/step - accuracy: 0.8842 - loss: 0.4877 Epoch 10: val_loss did not improve from 2.23658 26/26 ━━━━━━━━━━━━━━━━━━━━ 1s 55ms/step - accuracy: 0.8846 - loss: 0.4838 - val_accuracy: 0.1373 - val_loss: 4.1033 - learning_rate: 2.5000e-04 Epoch 11/20 25/26 ━━━━━━━━━━━━━━━━━━━━ 0s 49ms/step - accuracy: 0.8958 - loss: 0.3359 Epoch 11: val_loss did not improve from 2.23658 26/26 ━━━━━━━━━━━━━━━━━━━━ 1s 51ms/step - accuracy: 0.8956 - loss: 0.3397 - val_accuracy: 0.1765 - val_loss: 3.5208 - learning_rate: 2.5000e-04 Epoch 12/20 25/26 ━━━━━━━━━━━━━━━━━━━━ 0s 49ms/step - accuracy: 0.9099 - loss: 0.3405 Epoch 12: val_loss did not improve from 2.23658 26/26 ━━━━━━━━━━━━━━━━━━━━ 1s 51ms/step - accuracy: 0.9101 - loss: 0.3392 - val_accuracy: 0.1765 - val_loss: 3.7455 - learning_rate: 1.2500e-04 Epoch 13/20 25/26 ━━━━━━━━━━━━━━━━━━━━ 0s 50ms/step - accuracy: 0.9513 - loss: 0.2061 Epoch 13: val_loss did not improve from 2.23658 26/26 ━━━━━━━━━━━━━━━━━━━━ 1s 52ms/step - accuracy: 0.9506 - loss: 0.2076 - val_accuracy: 0.0980 - val_loss: 4.4924 - learning_rate: 1.2500e-04 Epoch 14/20 25/26 ━━━━━━━━━━━━━━━━━━━━ 0s 49ms/step - accuracy: 0.9460 - loss: 0.2314 Epoch 14: val_loss did not improve from 2.23658 26/26 ━━━━━━━━━━━━━━━━━━━━ 1s 51ms/step - accuracy: 0.9460 - loss: 0.2312 - val_accuracy: 0.1373 - val_loss: 5.0421 - learning_rate: 1.2500e-04 Epoch 15/20 25/26 ━━━━━━━━━━━━━━━━━━━━ 0s 50ms/step - accuracy: 0.9699 - loss: 0.1522 Epoch 15: val_loss did not improve from 2.23658 26/26 ━━━━━━━━━━━━━━━━━━━━ 1s 52ms/step - accuracy: 0.9698 - loss: 0.1532 - val_accuracy: 0.0784 - val_loss: 4.6256 - learning_rate: 1.2500e-04 Epoch 16/20 26/26 ━━━━━━━━━━━━━━━━━━━━ 0s 48ms/step - accuracy: 0.9668 - loss: 0.1342 Epoch 16: val_loss did not improve from 2.23658 26/26 ━━━━━━━━━━━━━━━━━━━━ 1s 51ms/step - accuracy: 0.9665 - loss: 0.1352 - val_accuracy: 0.1176 - val_loss: 4.5482 - learning_rate: 1.2500e-04 Epoch 17/20 25/26 ━━━━━━━━━━━━━━━━━━━━ 0s 49ms/step - accuracy: 0.9572 - loss: 0.1650 Epoch 17: val_loss did not improve from 2.23658 26/26 ━━━━━━━━━━━━━━━━━━━━ 1s 51ms/step - accuracy: 0.9577 - loss: 0.1632 - val_accuracy: 0.1176 - val_loss: 4.9174 - learning_rate: 1.2500e-04 Epoch 18/20 26/26 ━━━━━━━━━━━━━━━━━━━━ 0s 49ms/step - accuracy: 0.9586 - loss: 0.1724 Epoch 18: val_loss did not improve from 2.23658 26/26 ━━━━━━━━━━━━━━━━━━━━ 1s 51ms/step - accuracy: 0.9586 - loss: 0.1723 - val_accuracy: 0.1176 - val_loss: 4.3938 - learning_rate: 1.2500e-04 Epoch 19/20 25/26 ━━━━━━━━━━━━━━━━━━━━ 0s 50ms/step - accuracy: 0.9883 - loss: 0.0975 Epoch 19: val_loss did not improve from 2.23658 26/26 ━━━━━━━━━━━━━━━━━━━━ 1s 52ms/step - accuracy: 0.9881 - loss: 0.0966 - val_accuracy: 0.1765 - val_loss: 5.4650 - learning_rate: 1.2500e-04 Epoch 20/20 25/26 ━━━━━━━━━━━━━━━━━━━━ 0s 49ms/step - accuracy: 0.9696 - loss: 0.1042 Epoch 20: val_loss did not improve from 2.23658 26/26 ━━━━━━━━━━━━━━━━━━━━ 1s 50ms/step - accuracy: 0.9703 - loss: 0.1042 - val_accuracy: 0.1765 - val_loss: 5.5215 - learning_rate: 1.2500e-04
Model Evaluation & Visualization¶
# Plot training history and evaluate the model on test data
# ---------------------------------------------------------
# history : The training history object returned by model.fit(), containing loss and accuracy over epochs
# basic_cnn_model_1 : The trained Keras model to be evaluated
# X_test, y_test : Test dataset used to evaluate model performance after training
# model_name : (Optional) Custom name for title/labeling plots and saving figures
plot_training_history(history, basic_cnn_model_1, X_test, y_test, model_name="Basic CNN 1")
🔍 Final Epoch Metrics: 📈 Training Accuracy : 0.98 📉 Training Loss : 0.10 📈 Validation Accuracy : 0.18 📉 Validation Loss : 5.5215 🧪 Test Accuracy : 0.23 🧪 Test Loss : 4.43
Based on the final epoch metrics, test performance, and epoch-wise training logs, here are detailed observations and insights about your model's training behavior:
Large gap between training and validation/test accuracy and diverging loss values indicate overfitting. Model memorizes training data but fails to generalize.
Observation:
- Training accuracy steadily improves.
- Validation accuracy stagnates (~39%) and validation loss increases after epoch 5-6, showing early overfitting.
- Best generalization observed around epochs 5-6
Conclusion:
Poor generalization on unseen data confirmed by low test accuracy and high test loss.
Summary of Issues
| Problem | Evidence |
|---|---|
| Overfitting | High train acc vs low val/test acc |
| Poor generalization | Test accuracy and loss worse than validation |
| Validation loss rise | Val loss increases after epoch 5 while train loss decreases |
| Model complexity | Model fits training data too well too quickly |
# Evaluate the trained classification model on the test set
# ----------------------------------------------------------
# basic_cnn_model_1 : The trained Keras model that will be evaluated
# X_test : Test feature data (e.g., images) for model prediction
# y_test : True labels (can be one-hot encoded or class indices) for evaluating predictions
# y_train : Optional — training labels used to fit LabelEncoder on all classes (helps preserve class label mapping)
evaluate_classification_model(basic_cnn_model_1, X_test, y_test, y_train=y_train)
2/2 ━━━━━━━━━━━━━━━━━━━━ 0s 50ms/step Classification Report: precision recall f1-score support Apple Pie 0.12 0.20 0.15 5 Chocolate 0.20 0.20 0.20 5 French Fries 0.50 0.40 0.44 5 Hotdog 0.00 0.00 0.00 5 Nachos 0.00 0.00 0.00 5 Pizza 0.33 0.20 0.25 5 onion_rings 0.43 0.60 0.50 5 pancakes 0.33 0.33 0.33 6 spring_rolls 0.20 0.17 0.18 6 tacos 0.17 0.20 0.18 5 accuracy 0.23 52 macro avg 0.23 0.23 0.22 52 weighted avg 0.23 0.23 0.23 52
# Visualize predictions on random test images
# Arguments:
# - X_test : array of test images (preprocessed, shape like (N, H, W, C))
# - y_test : one-hot encoded true labels for test images
# - class_names : list of class label names corresponding to indices
# - basic_cnn_model_1 : trained classification model
# - num_samples : number of random samples to display (default is 5)
plot_random_predictions(X_test, y_test, class_names, basic_cnn_model_1, num_samples=20)
Classification Report Summary:
- Overall accuracy is low (~28%), showing the model struggles with correct predictions.
- Most classes have poor precision, recall, and F1-scores; the highest F1 is ~0.52 (Chocolate).
- Some classes (e.g., Hotdog) have zero precision and recall, meaning no correct predictions.
- Classes like Nachos show higher recall but low precision, indicating many false positives.
- The model likely underfits or lacks discriminative features.
- Recommendations:
- Increase dataset size or balance classes.
- Apply data augmentation.
- Use class weighting or sampling strategies.
- Tune or try more powerful models (e.g., transfer learning).
Observations on Confusion Matrix
This confusion matrix shows how well a model predicts food items. Here are some key points:
Good Predictions:
- The model is best at predicting "spring_rolls" (10 correct) and "tacos" (8 correct).
- "chocolate" (7 correct), "nachos" (7 correct), and "pancakes" (6 correct) are also predicted well.
Common Mistakes:
- "Apple Pie" is often confused with "pizza" (4 times) and "tacos" (4 times).
- "French Fries" is mistaken for "tacos" (3 times).
- "Hotdog" is confused with "tacos" (5 times).
- "onion_rings" is often predicted as "french_fries" (6 times).
Overall Performance:
- The model struggles with "Apple Pie" and "onion_rings" the most, as they have low correct predictions (2 and 4).
- Some classes like "pizza" and "hotdog" are confused with multiple other classes, showing the model has trouble distinguishing them.
The model does well for some foods but mixes up others, especially with "tacos" and "french_fries." It needs improvement for better accuracy.
Step 5.2 Build Basic CNN 2 (Improved CNN)¶
import numpy as np
import matplotlib.pyplot as plt
# Convert one-hot encoded label at index 0 to class index
data_index = 300;
idx = np.argmax(y_train[data_index])
# Print the index and corresponding class label
print("Sample class index:", idx)
print("Corresponding label:", class_names[idx])
plt.imshow(X_train[data_index])
plt.title(class_names[idx])
plt.axis('off')
plt.show()
Sample class index: 3 Corresponding label: Hotdog
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Conv2D, MaxPooling2D, GlobalAveragePooling2D, Dense, Dropout, BatchNormalization, Input
from tensorflow.keras.regularizers import l2
from tensorflow.keras.optimizers import Adam
basic_cnn_model_2 = Sequential([
Input(shape=(128, 128, 3)),
Conv2D(32, (3, 3), activation='relu', padding='same'),
BatchNormalization(),
MaxPooling2D((2, 2)),
Conv2D(64, (3, 3), activation='relu', padding='same'),
BatchNormalization(),
MaxPooling2D((2, 2)),
Conv2D(64, (3, 3), activation='relu', padding='same'),
BatchNormalization(),
MaxPooling2D((2, 2)),
GlobalAveragePooling2D(),
Dense(128, activation='relu', kernel_regularizer=l2(0.001)),
Dropout(0.5),
Dense(len(class_names), activation='softmax')
])
basic_cnn_model_2.compile(optimizer=Adam(learning_rate=1e-4),
loss='categorical_crossentropy',
metrics=['accuracy'])
basic_cnn_model_2.summary()
Model: "sequential_73"
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓ ┃ Layer (type) ┃ Output Shape ┃ Param # ┃ ┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩ │ conv2d_278 (Conv2D) │ (None, 128, 128, 32) │ 896 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ batch_normalization_176 │ (None, 128, 128, 32) │ 128 │ │ (BatchNormalization) │ │ │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ max_pooling2d_271 │ (None, 64, 64, 32) │ 0 │ │ (MaxPooling2D) │ │ │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ conv2d_279 (Conv2D) │ (None, 64, 64, 64) │ 18,496 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ batch_normalization_177 │ (None, 64, 64, 64) │ 256 │ │ (BatchNormalization) │ │ │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ max_pooling2d_272 │ (None, 32, 32, 64) │ 0 │ │ (MaxPooling2D) │ │ │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ conv2d_280 (Conv2D) │ (None, 32, 32, 64) │ 36,928 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ batch_normalization_178 │ (None, 32, 32, 64) │ 256 │ │ (BatchNormalization) │ │ │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ max_pooling2d_273 │ (None, 16, 16, 64) │ 0 │ │ (MaxPooling2D) │ │ │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ global_average_pooling2d_21 │ (None, 64) │ 0 │ │ (GlobalAveragePooling2D) │ │ │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ dense_147 (Dense) │ (None, 128) │ 8,320 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ dropout_191 (Dropout) │ (None, 128) │ 0 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ dense_148 (Dense) │ (None, 10) │ 1,290 │ └─────────────────────────────────┴────────────────────────┴───────────────┘
Total params: 66,570 (260.04 KB)
Trainable params: 66,250 (258.79 KB)
Non-trainable params: 320 (1.25 KB)
basic_cnn_model_2_history = train_model(basic_cnn_model_2, X_train, y_train, X_valid, y_valid, epochs=20, batch_size=16, filepath='basic_cnn_model_2_best.weights.h5')
Epoch 1/20 25/26 ━━━━━━━━━━━━━━━━━━━━ 0s 70ms/step - accuracy: 0.0916 - loss: 2.6087 Epoch 1: val_loss improved from inf to 2.74920, saving model to basic_cnn_model_2_best.weights.h5 26/26 ━━━━━━━━━━━━━━━━━━━━ 3s 77ms/step - accuracy: 0.0918 - loss: 2.6050 - val_accuracy: 0.1373 - val_loss: 2.7492 - learning_rate: 1.0000e-04 Epoch 2/20 25/26 ━━━━━━━━━━━━━━━━━━━━ 0s 69ms/step - accuracy: 0.0902 - loss: 2.5299 Epoch 2: val_loss improved from 2.74920 to 2.50387, saving model to basic_cnn_model_2_best.weights.h5 26/26 ━━━━━━━━━━━━━━━━━━━━ 2s 72ms/step - accuracy: 0.0918 - loss: 2.5261 - val_accuracy: 0.1373 - val_loss: 2.5039 - learning_rate: 1.0000e-04 Epoch 3/20 25/26 ━━━━━━━━━━━━━━━━━━━━ 0s 75ms/step - accuracy: 0.1166 - loss: 2.4303 Epoch 3: val_loss improved from 2.50387 to 2.43262, saving model to basic_cnn_model_2_best.weights.h5 26/26 ━━━━━━━━━━━━━━━━━━━━ 2s 77ms/step - accuracy: 0.1167 - loss: 2.4307 - val_accuracy: 0.0980 - val_loss: 2.4326 - learning_rate: 1.0000e-04 Epoch 4/20 25/26 ━━━━━━━━━━━━━━━━━━━━ 0s 69ms/step - accuracy: 0.1419 - loss: 2.4111 Epoch 4: val_loss improved from 2.43262 to 2.38982, saving model to basic_cnn_model_2_best.weights.h5 26/26 ━━━━━━━━━━━━━━━━━━━━ 2s 71ms/step - accuracy: 0.1433 - loss: 2.4101 - val_accuracy: 0.1176 - val_loss: 2.3898 - learning_rate: 1.0000e-04 Epoch 5/20 25/26 ━━━━━━━━━━━━━━━━━━━━ 0s 69ms/step - accuracy: 0.1307 - loss: 2.4276 Epoch 5: val_loss improved from 2.38982 to 2.36039, saving model to basic_cnn_model_2_best.weights.h5 26/26 ━━━━━━━━━━━━━━━━━━━━ 2s 71ms/step - accuracy: 0.1311 - loss: 2.4269 - val_accuracy: 0.1176 - val_loss: 2.3604 - learning_rate: 1.0000e-04 Epoch 6/20 25/26 ━━━━━━━━━━━━━━━━━━━━ 0s 69ms/step - accuracy: 0.1826 - loss: 2.2419 Epoch 6: val_loss improved from 2.36039 to 2.35024, saving model to basic_cnn_model_2_best.weights.h5 26/26 ━━━━━━━━━━━━━━━━━━━━ 2s 71ms/step - accuracy: 0.1820 - loss: 2.2447 - val_accuracy: 0.1569 - val_loss: 2.3502 - learning_rate: 1.0000e-04 Epoch 7/20 25/26 ━━━━━━━━━━━━━━━━━━━━ 0s 83ms/step - accuracy: 0.1655 - loss: 2.2877 Epoch 7: val_loss improved from 2.35024 to 2.33138, saving model to basic_cnn_model_2_best.weights.h5 26/26 ━━━━━━━━━━━━━━━━━━━━ 2s 86ms/step - accuracy: 0.1645 - loss: 2.2910 - val_accuracy: 0.2549 - val_loss: 2.3314 - learning_rate: 1.0000e-04 Epoch 8/20 25/26 ━━━━━━━━━━━━━━━━━━━━ 0s 74ms/step - accuracy: 0.2436 - loss: 2.3232 Epoch 8: val_loss improved from 2.33138 to 2.29568, saving model to basic_cnn_model_2_best.weights.h5 26/26 ━━━━━━━━━━━━━━━━━━━━ 2s 77ms/step - accuracy: 0.2427 - loss: 2.3215 - val_accuracy: 0.2157 - val_loss: 2.2957 - learning_rate: 1.0000e-04 Epoch 9/20 25/26 ━━━━━━━━━━━━━━━━━━━━ 0s 71ms/step - accuracy: 0.2186 - loss: 2.2519 Epoch 9: val_loss improved from 2.29568 to 2.27044, saving model to basic_cnn_model_2_best.weights.h5 26/26 ━━━━━━━━━━━━━━━━━━━━ 2s 74ms/step - accuracy: 0.2171 - loss: 2.2536 - val_accuracy: 0.2549 - val_loss: 2.2704 - learning_rate: 1.0000e-04 Epoch 10/20 25/26 ━━━━━━━━━━━━━━━━━━━━ 0s 69ms/step - accuracy: 0.2218 - loss: 2.1913 Epoch 10: val_loss improved from 2.27044 to 2.26747, saving model to basic_cnn_model_2_best.weights.h5 26/26 ━━━━━━━━━━━━━━━━━━━━ 2s 71ms/step - accuracy: 0.2209 - loss: 2.1928 - val_accuracy: 0.2549 - val_loss: 2.2675 - learning_rate: 1.0000e-04 Epoch 11/20 26/26 ━━━━━━━━━━━━━━━━━━━━ 0s 70ms/step - accuracy: 0.2001 - loss: 2.2228 Epoch 11: val_loss improved from 2.26747 to 2.24480, saving model to basic_cnn_model_2_best.weights.h5 26/26 ━━━━━━━━━━━━━━━━━━━━ 2s 74ms/step - accuracy: 0.2007 - loss: 2.2219 - val_accuracy: 0.2745 - val_loss: 2.2448 - learning_rate: 1.0000e-04 Epoch 12/20 25/26 ━━━━━━━━━━━━━━━━━━━━ 0s 77ms/step - accuracy: 0.2629 - loss: 2.1499 Epoch 12: val_loss improved from 2.24480 to 2.22419, saving model to basic_cnn_model_2_best.weights.h5 26/26 ━━━━━━━━━━━━━━━━━━━━ 2s 79ms/step - accuracy: 0.2630 - loss: 2.1508 - val_accuracy: 0.2353 - val_loss: 2.2242 - learning_rate: 1.0000e-04 Epoch 13/20 25/26 ━━━━━━━━━━━━━━━━━━━━ 0s 70ms/step - accuracy: 0.2225 - loss: 2.2148 Epoch 13: val_loss improved from 2.22419 to 2.21153, saving model to basic_cnn_model_2_best.weights.h5 26/26 ━━━━━━━━━━━━━━━━━━━━ 2s 72ms/step - accuracy: 0.2237 - loss: 2.2109 - val_accuracy: 0.2549 - val_loss: 2.2115 - learning_rate: 1.0000e-04 Epoch 14/20 25/26 ━━━━━━━━━━━━━━━━━━━━ 0s 70ms/step - accuracy: 0.2792 - loss: 2.1106 Epoch 14: val_loss improved from 2.21153 to 2.20730, saving model to basic_cnn_model_2_best.weights.h5 26/26 ━━━━━━━━━━━━━━━━━━━━ 2s 72ms/step - accuracy: 0.2779 - loss: 2.1116 - val_accuracy: 0.2745 - val_loss: 2.2073 - learning_rate: 1.0000e-04 Epoch 15/20 25/26 ━━━━━━━━━━━━━━━━━━━━ 0s 70ms/step - accuracy: 0.2980 - loss: 2.1514 Epoch 15: val_loss improved from 2.20730 to 2.19979, saving model to basic_cnn_model_2_best.weights.h5 26/26 ━━━━━━━━━━━━━━━━━━━━ 2s 73ms/step - accuracy: 0.2969 - loss: 2.1514 - val_accuracy: 0.2157 - val_loss: 2.1998 - learning_rate: 1.0000e-04 Epoch 16/20 25/26 ━━━━━━━━━━━━━━━━━━━━ 0s 71ms/step - accuracy: 0.2842 - loss: 2.1513 Epoch 16: val_loss improved from 2.19979 to 2.19116, saving model to basic_cnn_model_2_best.weights.h5 26/26 ━━━━━━━━━━━━━━━━━━━━ 2s 73ms/step - accuracy: 0.2843 - loss: 2.1491 - val_accuracy: 0.2157 - val_loss: 2.1912 - learning_rate: 1.0000e-04 Epoch 17/20 25/26 ━━━━━━━━━━━━━━━━━━━━ 0s 69ms/step - accuracy: 0.2821 - loss: 2.0929 Epoch 17: val_loss improved from 2.19116 to 2.19076, saving model to basic_cnn_model_2_best.weights.h5 26/26 ━━━━━━━━━━━━━━━━━━━━ 2s 72ms/step - accuracy: 0.2809 - loss: 2.0935 - val_accuracy: 0.2157 - val_loss: 2.1908 - learning_rate: 1.0000e-04 Epoch 18/20 25/26 ━━━━━━━━━━━━━━━━━━━━ 0s 70ms/step - accuracy: 0.2972 - loss: 2.1153 Epoch 18: val_loss improved from 2.19076 to 2.15424, saving model to basic_cnn_model_2_best.weights.h5 26/26 ━━━━━━━━━━━━━━━━━━━━ 2s 72ms/step - accuracy: 0.2982 - loss: 2.1121 - val_accuracy: 0.2549 - val_loss: 2.1542 - learning_rate: 1.0000e-04 Epoch 19/20 25/26 ━━━━━━━━━━━━━━━━━━━━ 0s 70ms/step - accuracy: 0.2608 - loss: 2.0674 Epoch 19: val_loss improved from 2.15424 to 2.15281, saving model to basic_cnn_model_2_best.weights.h5 26/26 ━━━━━━━━━━━━━━━━━━━━ 2s 72ms/step - accuracy: 0.2640 - loss: 2.0656 - val_accuracy: 0.2549 - val_loss: 2.1528 - learning_rate: 1.0000e-04 Epoch 20/20 25/26 ━━━━━━━━━━━━━━━━━━━━ 0s 69ms/step - accuracy: 0.2518 - loss: 2.1404 Epoch 20: val_loss did not improve from 2.15281 26/26 ━━━━━━━━━━━━━━━━━━━━ 2s 71ms/step - accuracy: 0.2541 - loss: 2.1379 - val_accuracy: 0.2745 - val_loss: 2.1570 - learning_rate: 1.0000e-04
# batch_images, batch_labels = next(train_generator)
# plt.imshow(batch_images[15]) # visualize
# print(batch_labels[15])
# print(np.argmax(batch_labels[15])) # check label
[0. 0. 1. 0. 0. 0. 0. 0. 0. 0.] 2
plot_training_history(basic_cnn_model_2_history, basic_cnn_model_2, X_test, y_test, model_name="Basic CNN 2")
🔍 Final Epoch Metrics: 📈 Training Accuracy : 0.28 📉 Training Loss : 2.11 📈 Validation Accuracy : 0.27 📉 Validation Loss : 2.1570 🧪 Test Accuracy : 0.29 🧪 Test Loss : 2.12
# Evaluate the trained classification model on the test set
# ----------------------------------------------------------
# basic_cnn_model_1 : The trained Keras model that will be evaluated
# X_test : Test feature data (e.g., images) for model prediction
# y_test : True labels (can be one-hot encoded or class indices) for evaluating predictions
# y_train : Optional — training labels used to fit LabelEncoder on all classes (helps preserve class label mapping)
evaluate_classification_model(basic_cnn_model_2, X_test, y_test, y_train=y_train)
2/2 ━━━━━━━━━━━━━━━━━━━━ 0s 35ms/step Classification Report: precision recall f1-score support Apple Pie 0.50 0.20 0.29 5 Chocolate 0.30 0.60 0.40 5 French Fries 0.00 0.00 0.00 5 Hotdog 0.33 0.20 0.25 5 Nachos 0.20 0.20 0.20 5 Pizza 0.40 0.40 0.40 5 onion_rings 0.25 0.40 0.31 5 pancakes 0.67 0.33 0.44 6 spring_rolls 0.23 0.50 0.32 6 tacos 0.00 0.00 0.00 5 accuracy 0.29 52 macro avg 0.29 0.28 0.26 52 weighted avg 0.29 0.29 0.26 52
test_loss, test_acc = basic_cnn_model_2.evaluate(X_test, y_test)
print(f"Test Accuracy: {test_acc * 100:.2f}%")
2/2 ━━━━━━━━━━━━━━━━━━━━ 0s 37ms/step - accuracy: 0.2965 - loss: 2.1394 Test Accuracy: 28.85%
# Visualize predictions on random test images
# Arguments:
# - X_test : array of test images (preprocessed, shape like (N, H, W, C))
# - y_test : one-hot encoded true labels for test images
# - class_names : list of class label names corresponding to indices
# - basic_cnn_model_1 : trained classification model
# - num_samples : number of random samples to display (default is 5)
plot_random_predictions(X_test, y_test, class_names, basic_cnn_model_1, num_samples=20)
CNN Model Comparison Report¶
Overview¶
| Aspect | Model 1 | Model 2 | | --------------- | ------------------------- | ---------------------------------- | | Architecture | Basic CNN (3 conv layers) | Deep CNN with 5 conv blocks | | | Overfitting | Yes – severe | No – well-regularized | | Regularization | Dropout only | Dropout + BatchNorm + Augmentation |
Detailed Observations¶
1. Underfitting in Both Models¶
- Both models achieved low training and validation accuracy.
- Indicates that the architectures are too simple to learn rich image features.
2. Low Precision, Recall, and F1-Scores¶
- Many classes show poor recall (0.0), meaning the models miss actual class instances.
- Some classes like
pancakes,chocolate, andspring_rollsperformed relatively better due to simpler or distinctive features.
3. Inconsistent Class Predictions¶
- Class imbalance and very small test set (5–6 samples per class) caused volatile evaluation results.
- Some classes had zero correct predictions.
4. Slow Learning¶
- Gradual increase in training accuracy across 20 epochs.
- Suggests the learning rate is appropriate, but the model complexity is insufficient.
Model Architecture Feedback¶
Common Weaknesses:¶
- Shallow depth with limited filters.
- Lacking normalization layers.
- Flatten layer may cause overfitting to spatial locations.
Improvements Made in Model 2:¶
- Despite any enhancements, performance did not significantly improve.
- Indicates that model complexity needs more robust changes.
Unified Recommendations¶
Model Enhancements:¶
- Add more
Conv2Dblocks with higher filter counts. - Use
BatchNormalizationandDropoutafter each block. - Replace
FlattenwithGlobalAveragePooling2D.
Data Handling:¶
- Apply aggressive data augmentation using
ImageDataGenerator.- Rotation, zoom, shift, flip, brightness, etc.
- Upsample or use class weights to handle imbalance.
Training Strategy:¶
- Train for 50–100 epochs using
EarlyStoppingandReduceLROnPlateau. - Use learning rate warm-up or scheduling for smoother convergence.
Upgrade to Transfer Learning:¶
- Use a pretrained CNN (e.g., MobileNetV2, EfficientNet, ResNet50).
- Fine-tune only the last few layers.
- Works well on small datasets with limited computation.
Conclusion¶
Both models show initial promise but are severely limited by:
- Architectural simplicity
- Insufficient data representation
- Low diversity in training samples
To progress meaningfully, we recommend moving toward transfer learning, stronger architectures, and better data engineering.